Identifying the Most Suitable Stemmer for the CHiC Multilingual Ad-hoc Task

نویسندگان

  • Thomas Wilhelm-Stein
  • Benjamin Schürer
  • Maximilian Eibl
چکیده

Because the 2013 Cultural Heritage in CLEF (CHiC) lab focused on multilingual retrieval, our goals were the integration of Apache Solr in our Xtrieval framework and the evaluation of different stemmers available for most of the relevant languages. As there were thirteen languages to cover, we tried to find a generic stemmer which works with all languages. We experimented with four setups, where one setup was without any stemmer, two setups used mainly rule-based stemmers and the last setup used a dictionary-based stemmer. For the dictionary-based stemmer we employed the HunSpell stemmer, which works with the same dictionaries as OpenOffice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cultural Heritage in CLEF (CHiC) 2013 - Multilingual Task Overview

The Cultural Heritage in CLEF 2013 multilingual task comprised two sub-tasks: multilingual ad-hoc retrieval and semantic enrichment. The multilingual ad-hoc retrieval sub-task evaluated retrieval experiments in 13 languages (Dutch, English, German, Greek, Finnish, French, Hungarian, Italian; Norwegian, Polish, Slovenian, Spanish, Swedish). More than 140,000 documents were assessed for relevance...

متن کامل

Cultural Heritage in CLEF (CHiC) 2013

The Cultural Heritage in CLEF 2013 lab comprised three tasks: multilingual ad-hoc retrieval and semantic enrichment in 13 languages (Dutch, English, German, Greek, Finnish, French, Hungarian, Italian, Norwegian, Polish, Slovenian, Spanish, and Swedish), Polish ad-hoc retrieval and the interactive task, which studied user behavior via log analysis and questionnaires. For the multilingual and Pol...

متن کامل

CEA LIST's Participation at the CLEF CHiC 2013

For our first participation to the CLEF CHiC Lab, we submitted runs to the multilingual ad-hoc and multilingual semantic enrichment tasks. Given the strong multilingual character of the evaluation corpus, the main objectives of the experiments were to test the efficiency of semantic topic expansion and consolidation based on Explicit Semantic Analysis (ESA) versions in different languages. Anot...

متن کامل

Multimedia Information Modeling and Retrieval (MRIM) /Laboratoire d'Informatique de Grenoble (LIG) at CHiC2013

Numerous cultural heritage materials are accessible through online digital library portals. However, this conversion resulted in the issues of inconsistency and incompleteness. The Cultural Heritage in CLEF 2013 (CHiC) takes the initiative to organize an evaluation campaign which involve several tasks such as 1) multilingual task, 2) polish task and 3) interactive task. We present the results o...

متن کامل

The Sheffield and Basque Country Universities Entry to CHiC: Using Random Walks and Similarity to Access Cultural Heritage

The Cultural Heritage in CLEF 2012 (CHiC) pilot evaluation included these tasks: ad-hoc retrieval, semantic enrichment and variability tasks. At CHiC 2012, the University of Sheffield and the University of the Basque Country submitted a joint entry, attempting the three English monolingual tasks. For the ad-hoc task, the baseline approach used the Indri Search engine. Query expansion approaches...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013